A modeling approach to the evaluation of internal sorting methods

نویسندگان

  • S. Sitharama Iyengar
  • Dale R. Barrett
چکیده

The purpose of this paper is to report a modeling approach to the evaluation of internal sorting methods. The technique used is a regression modeling technique and has been found to be a very fast statistical method for evaluation which relies on the performance of data collection from the system being evaluated. The parameters considered for evaluation are: (I) number of stages, (2) number of transfers, (3) number of records, (4) sort time, (5) number of comparisons. The empirical model has been developed for sorting time as a function of the number of stages, number of records, number of comparisons, and number of transfers. The correlation coefficient obtained during the process of modeling was an average of 0.96 and has been found statistically significant. I. I N T R O D U C T I O N There has been m u c h work going on to find eff ic ient me thods of sort ing, since it is an impor t an t part of m a n y large business da ta -process ing problems . K n u t h [1] gives a deta i led descr ip t ion of searching and sorting, and the a lgor i thms descr ibed by h im are s imple but author i ta t ive . He notes that sor t ing can be classified genera l ly into in terna l sorting, in which the records are kept in the compu te r ' s h igh-speed r andom-acces s memory , and external sort ing, when there are more records than can be held in m e m o r y at once. In te rna l sor t ing al lows more flexibil i ty in the s t ructur ing and access ing of the data , while external sor t ing shows us how to live with ra ther s t r ingent access ing categories . The six d i f ferent sor t ing techniques for our purpose are inser t ion sort, shell sort, quick sort, bubb le sort, and tree sort. W e shall present a little descr ip t ion of each of these sor t ing me thods (for detai ls see Ref . [1]): *Presently at TRW, Inc. Rodendo Beach, Calilornia. **Presently at Louisiana State University, Baton Rouge, Louisiana 70803. ©Elsevier North Holland, Inc., 1980 52 Vanderbih Ave., New York, NY I0017 0020-0255/80/08079-20501.75 80 S. S I T H A R A M A I Y E N G A R A N D D A L E R. BARRETT Insertion sort. The items are considered one at a time, and each new item is inserted into the appropriate position relative to the previously sorted items. (This is the way many bridge players sort their hands, picking up one card at a time.) Exchange sort. If two items are found to be out of order, they are interchanged. This process is repeated until no more exchanges are necessary. Selection sort. First the smallest (or perhaps the largest) item is located, and it is somehow separated from the rest; then the next smallest (or the next largest) is selected, and so on. Merge sort. Merging (or collating) means the combination of two or more ordered files into a single ordered file. Distribution sort. Readers who are familiar with punched-card equipment are well aware of the efficient procedure used on card sorters, based on the digits of the keys; the same idea can be adapted to computer programming, and it is generally known as "radix sorting," "digital sorting," or "pocket sorting." According to the above definitions the technique called in the previous report I "insertion sorting" is really a selection sort and will be so called in this report. The tree sort is considered a type of selection sort. The technique called selection sort previously is actually the shell-of-diminishing-increment sort. The shell sort procedure is a variation of the insertion technique. The bubble sort is considered one of the exchange procedures, as is the quick sort. Algorithms for these five different procedures were obtained from D. E. Knuth and also from A. T. Ber-ztiss [5]. Knuth suggests that the quick sort method could be enhanced if "subfiles of M of fewer elements are left unsorted until the very end of the procedure, then a single pass of straight insertion is used to produce the final ordering." The algorithm for straight insertion was therefore used as a separate procedure as well as a procedure to be used in conjunction with quick sort. The remainder of the paper is organized as follows: general description of the model and its design, description of the input data to the model, calibration of the model, discussion of the results, model predictions, and conclusions. 2. G E N E R A L D E S C R I P T I O N OF T H E M O D E L A N D ITS D E S I G N A regression model [6, 7] is considered as a fast statistical model of system performance which relies on the performance data collected from the system being evaluated. In view of the development of efficient algorithms described I"A Note on Comparison of Internal Sorting Methods," an unpublished paper by S. Sithaxama Iyengar and Wendall Ingrain. I N T E R N A L S O R T I N G M E T H O D S 81 in Knuth ' s book [1], an empirical model to evaluate the sort t ime as a function of number of stages, number of comparisons, number of transfers, and number of records will enhance the process of evaluat ion of internal sorting methods. Before we go into the formulat ion of the model, we shall discuss the system parameters: the number of stages, number of comparisons, number of transfers, and number of records. The number of stages is how many times the sort method must cycle before completion. The storage ratio is the ratio of the number of storage locations to the number of elements to be sorted. The number of transfers is an indicat ion of the model ' s activity. The model we are proposing in our paper will be of the following form: $=f( NR, Dr, N,, Nc), (1) where 8 = sort time in sec, N R = number of records, D, = number of transfers, N c = number of comparisons, N, = number of stages. The formulat ion process of the model is explained in Sec. 3.2. The above functional relat ionship can be reduced to the following form after a multiple regression analysis: O=Co+C,NR+C2D,+C3N,+GN~. (2) This general model for all types of sort will be obta ined after performing multiple regression analysis with 8 as a dependent variable and N a, D,, N s, and Nc as independent variables. C t, C 2, C 3, C 4 are regression coefficients, and C o is the intercept obta ined after the mult iple regression analysis. The model sort ing-t ime equation can be expressed as a linear (log-log) function using the sass (Statistical Package for Social Science) program, which will be descr ibed in the next section of the paper. Then the equation (2) can be expressed in the following form: log O = Co + C, log N R + C2 log D t (3) + c3 log N, + G log No,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new approach for Modeling and Evaluation of efficiency and power generation in Sterling engine; Analytical study

Although, the Stirling engine (SE) was invented many years ago, the investigation on SE is still interesting due to variety of energy resources can be applied to power it (solar energy, fossil fuel, biomass and geothermal energy). In this paper, the thermodynamic cycle of SE is analyzed by employing a new analytical model and a new method is presented to evaluate output power and efficiency of ...

متن کامل

Modeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism

In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...

متن کامل

Modeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism

In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...

متن کامل

Modeling of Individual, Job Characteristics and Workplace Conditions with General Health of Female Carpet Weavers Using an Ergonomic Approach

Introduction: The carpet industry is one of the most important handicrafts in Iran. This industry is one of the most difficult and harmful professions, in which the employees are often working in a workplace with non-ergonomic, unsafe, and unhealthy conditions. The present study aimed to address the modeling of the individual, job characteristics, and workplace conditions with the general healt...

متن کامل

Optimal Distribution System Reconfiguration Using Non-dominated Sorting Genetic Algorithm (NSGA-II)

In this paper, a Non-dominated Sorting Genetic Algorithm-II (NSGA-II) based approach is presented for distribution system reconfiguration. In contrast to the conventional GA based methods, the proposed approach does not require weighting factors for conversion of multi-objective function into an equivalent single objective function. In order to illustrate the performance of the proposed method,...

متن کامل

Modeling Static Bruising in Apple Fruits: A Comparative Study, Part II: Finite Element Approach

ABSTRACT- Mechanical damage degrades fruit quality in the chain from production to the consumption. Damage is due to static, impact and vibration loads during processes such as harvesting, transportation, sorting and bulk storage. In the present study finite element (FE) models were used to simulate the process of static bruising for apple fruits by contact of the fruit with a hard surface. Thr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 22  شماره 

صفحات  -

تاریخ انتشار 1980